A Flexible Representation of Heterogeneous Annotation Data

نویسندگان

  • Richard Johansson
  • Alessandro Moschitti
چکیده

This paper describes a new flexible representation for the annotation of complex structures of metadata over heterogeneous data collections containing text and other types of media such as images or audio files. We argue that existing frameworks are not suitable for this purpose, most importantly because they do not easily generalize to multi-document and multimodal corpora, and because they often require the use of particular software frameworks. In the paper, we define a data model to represent such structured data over multimodal collections. Furthermore, we define a surface realization of the data structure as a simple and readable XML format. We present two examples of annotation tasks to illustrate how the representation and format work for complex structures involving multimodal annotation and cross-document links. The representation described here has been used in a large-scale project focusing on the annotation of a wide range of information – from low-level features to high-level semantics – in a multimodal data collection containing both text and images.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Accessing Heterogeneous Linguistic Data — Generic XML-based Representation and Flexible Visualization

Annotation of linguistic data increasingly focuses on information beyond the (morpho-)syntactic level. Moreover, annotated data of less-studied languages is growing in importance. To maximally profit from this data, straightforward and user-friendly access has to be provided. In this paper, we describe a linguistic database that is accessed via a web browser and offers flexible visualization of...

متن کامل

Flexible Integration of Molecular-Biological Annotation Data: The GenMapper Approach

Molecular-biological annotation data is continuously being collected, curated and made accessible in numerous public data sources. Integration of this data is a major challenge in bioinformatics. We present the GenMapper system that physically integrates heterogeneous annotation data in a flexible way and supports large-scale analysis on the integrated data. It uses a generic data model to unif...

متن کامل

Consistent and Flexible Integration of Morphological Annotation in the Arabic Treebank

Treebank Annotation Issue: Multiple Levels of Annotation • Annotation not on the source text, but more abstract representation • How to maintain annotation consistency and relation between different levels? • How to make available the multiple levels of representation for the user? Arabic Treebank as a case study: • Mapping between two levels of annotation: • Morphological analysis of source te...

متن کامل

Cost-Based Query Optimization in a Heterogeneous Distributed Semi-Structured Environment

How to efficiently process queries in an heterogeneous and distributed data integration environment is an interesting and unsolved topic. Our research project proposes an approach for providing a generic cost framework for query optimization in an XML-based mediation system called XLive, which integrates heterogeneous data sources. Our approach relies on cost annotation on an XQuery logical rep...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2010